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Abstract 

Sparse coding, which is the decomposition of a vector 
using only a few basis elements, is widely used in machine 
learning and image processing. The basis set, also called 
dictionary, is learned to adapt to specific data. This ap- 
proach has proven to be very effective in many image pro- 
cessing tasks. Traditionally, the dictionary is an unstruc- 
tured ''flat'' set of atoms. In this paper, we study structured 
dictionaries [ ] which are obtained from an epitome [ ], 
or a set of epitomes. The epitome is itself a small image, and 
the atoms are all the patches of a chosen size inside this im- 
age. This considerably reduces the number of parameters to 
learn and provides sparse image decompositions with shift- 
invariance properties. We propose a new formulation and 
an algorithm for learning the structured dictionaries asso- 
ciated with epitomes, and illustrate their use in image de- 
noising tasks. 



1. Introduction 

Jojic, Frey and Kannan [' ' ] introduced in 2003 a prob- 
abilistic generative image model called an epitome. Intu- 
itively, the epitome is a small image that summarizes the 
content of a larger one, in the sense that for any patch from 
the large image there should be a similar one in the epit- 
ome. This is an intriguing notion, which has been applied 
to image reconstruction tasks [ ], and epitomes have also 
been extended to the video domain [ ], where they have 
been used in denoising, superresolution, object removal 
and video interpolation. Other successful applications of 
epitomes include location recognition [ ] or face recogni- 
tion [ ]. 

Aharon and Elad [ ] have introduced an alternative for- 
mulation within the sparse coding framework called image- 
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Figure 1. A "flat" dictionary (left) vs. an epitome (right). Sparse 
coding with an epitome is similar to sparse coding with a flat dic- 
tionary, except that the atoms are extracted from the epitome and 
may overlap instead of being chosen from an unstructured set of 
patches and assumed to be independent one from each other. 

signature dictionary, and applied it to image denoising. 
Their formulation unifies the concept of epitome and dictio- 
nary learning [ , ] by allowing an image patch to be rep- 
resented as a sparse linear combination of several patches 
extracted from the epitome (Figure 1). The resulting sparse 
representations are highly redundant (there are as many dic- 
tionary elements as overlapping patches in the epitome), 
with dictionaries represented by a reasonably small number 
of parameters (the number of pixels in the epitome). Such a 
representation has also proven to be useful for texture syn- 
thesis [ ]. 

In a different line of work, some research has been fo- 
cusing on learning shift-invariant dictionaries [^^, "^^], in 
the sense that it is possible to use dictionary elements with 
different shifts to represent signals, exhibiting patterns that 
may appear several times at different positions. While this 
is different from the image- signature dictionaries of Aharon 
and Elad [ ], the two ideas are related, and as shown in this 
paper, such a shift invariance can be achieved by using a col- 
lection of smaller epitomes. In fact, one of our main contri- 
butions is to unify the frameworks of epitome and dictionary 
learning, and establish the continuity between dictionaries, 
dictionaries with shift invariance, and epitomes. 

We propose a formulation based on the concept of 
epitomes/image-signature-dictionaries introduced by [i, 

], which allows to learn a collection of epitomes, and 
which is generic enough to be used with epitomes that may 
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have different shapes, or with different dictionary param- 
eterizations. We present this formulation for the specific 
case of image patches for simpHcity, but it appHes to spatio- 
temporal blocks in a straightforward manner. 

The following notation is used throughout the paper: we 
define for ^ 1 the ^^-norm of a vector x in as ||x||g = 
(XljLi 1^)^^^' where Xj denotes the j-th coordinate of x. 
if X is a matrix in M^x^, will denote its i^^ row, while 
xj will denote its j^^ column. As usual, Xij will denote the 
entry of X at the i^^-row and j^^ -column. We consider the 
Frobeniusnormof X: ||X||^ 4 (Elli E;=i 

This paper is organized as follows: Section 2 introduces 
our formulation. We present our dictionary learning algo- 
rithm in Section 3. Section 4 introduces different improve- 
ments for this algorithm, and Section 5 demonstrates exper- 
imentally the usefulness of our approach. 

2. Proposed Approach 

Given a set of n training image patches of size m pixels, 
represented by the columns of a matrix X = [xi, . . . , x^] 
in M^x^, the classical dictionary learning formulation, as 
introduced by [ ] and revisited by [ \ ], tries to find a 
dictionary D = [di, . . . , dp] in R^xp ^^^j^ ^j^^^ ^^^j^ 
nal x^ can be represented by a sparse linear combination 
of the columns of D. More precisely, the dictionary D is 
learned along with a matrix of decomposition coefficients 
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ery signal x^. Following [ ], we consider the following 
formulation: 
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where the quadratic term ensures that the vectors x. are 
close to the approximation Da,, the ^i-norm induces spar- 
sity in the coefficients a, (see, e.g., [4, 24]), and A controls 
the amount of regularization. To prevent the columns of D 
from being arbitrarily large (which would lead to arbitrar- 
ily small values of the a,), the dictionary D is constrained 
to belong to the convex set V of matrices in R^xp ^j^ose 
columns have an ^2 -norm less than or equal to one: 



D = {Dg 



^mxp 



s.t. Vj = l,...,p, 



l|d,| 



As will become clear shortly, this constraint is not 
adapted to dictionaries extracted from epitomes, since over- 
lapping patches cannot be expected to all have the same 
norm. Thus we introduce an unconstrained formulation 
equivalent to Eq. (1): 



This formulation removes the constraint D e V from 
Eq. (1), and replaces the ^i-norm by a weighted £i-norm. 
As shown in Appendix A, Eq. (1) and Eq. (2) are equiv- 
alent in the sense that a solution of Eq. (1) is also solu- 
tion of Eq. (2), and for every solution of Eq. (2), a solution 
for Eq. (1) can be obtained by normalizing its columns to 
one. To the best of our knowledge, this equivalent formu- 
lation is new, and is key to learning an epitome with £1- 
regularization: the use of a convex regularizer (the £i-norm) 
that empirically provides better-behaved dictionaries than 
io (where the £0 pseudo-norm counts the number of non- 
zero elements in a vector) for denoising tasks (see Table 1) 
differentiates us from the ISD formulation of [ 1] . To prevent 
degenerate solutions in the dictionary learning formulation 
with ^i-norm, it is important to constrain the dictionary ele- 
ments with the ^2 -norm. Whereas such a constraint can eas- 
ily be imposed in classical dictionary learning, its extension 
to epitome learning is not straightforward, and the original 
ISD formulation is not compatible with convex regularizers. 
Eq. (2) is an equivalent unconstrained formulation, which 
lends itself well to epitome learning. 

We can now formally introduce the general concept of 
an epitome as a small image of size >/M x ^/M, encoded 
(for example in row order) as a vector E in R^. We also 
introduce a linear operator (p : ^mxp ^^^^ extracts 

all overlapping patches from the epitome E, and rearranges 
them into the columns of a matrix of R^xp^ ^j^^ integer p 
being the number of such overlapping patches. Concretely, 
we have p = {^/M — y/m + 1)^. In this context, (f^E) 
can be interpreted as a traditional flat dictionary with p el- 
ements, except that it is generated by a small number M 
of parameters compared to the pm parameters of the flat 
dictionary. Our approach thus generalizes to a much wider 
range of epitomic structures using any mapping (p that ad- 
mits fast projections on lm{(p). The functions (p we have 
used so far are relatively simple, but give a framework that 
easily extends to families of epitomes, shift-invariant dic- 
tionaries, and plain dictionaries. The only assumption we 
make is that (/p is a linear operator of rank M (i.e., (p is in- 
jective). This list is not exhaustive, which naturally opens 
up new perspectives. The fact that a dictionary D is ob- 
tained from an epitome is characterized by the fact that D 
is in the image Im cp of the linear operator cp. Given a dic- 
tionary D in Im (p, the unique (by injectivity of (p) epitome 
representation can be obtained by computing the inverse of 
(p on Im (p, for which a closed form using pseudo-inverses 
exists as shown in Appendix B. 

Our goal being to adapt the epitome to the training image 
patches, the general minimization problem can therefore be 
expressed as follows: 
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There are several motivations for such an approach. As dis- 
cussed above, the choice of the function lets us adapt this 
technique to different problems such as multiple epitomes 
or any other type of dictionary representation. This formu- 
lation is therefore deliberately generic. In practice, we have 
mainly focused on two simple cases in the experiments of 
this paper: a single epitome [ 1 ] (or image signature dictio- 
nary [ ]) and a set of epitomes. Furthermore, we have now 
come down to a more traditional, and well studied problem: 
dictionary learning. We will therefore use the techniques 
and algorithms developed in the dictionary learning litera- 
ture to solve the epitome learning problem. 

3. Basic Algorithm 

As for classical dictionary learning, the optimization 
problem of Eq. (3) is not jointly convex in (D, A), but is 
convex with respect to D when A is fixed and vice-versa. 
A block-coordinate descent scheme that alternates between 
the optimization of D and A, while keeping the other pa- 
rameter fixed, has emerged as a natural and simple way for 
learning dictionaries [ , ], which has proven to be rela- 
tively efficient when the training set is not too large. Even 
though the formulation remains nonconvex and therefore 
this method is not guaranteed to find the global optimum, 
it has proven experimentally to be good enough for many 
tasks [ ]. 

We therefore adopt this optimization scheme as well, and 
detail the different steps below. Note that other algorithms 
such as stochastic gradient descent (see [ \ ^ ^]) could be 
used as well, and in fact can easily be derived from the ma- 
terial of this section. However, we have chosen not to inves- 
tigate these kind of techniques for simplicity reasons. In- 
deed, stochastic gradient descent algorithms are potentially 
more efficient than the block-coordinate scheme mentioned 
above, but require the (sometimes non-trivial) tuning of a 
learning rate. 

3.1. Step 1: Optimization of A with D Fixed. 

In this step of the algorithm, D is fixed, so the constraint 
D G Im if is not involved in the optimization of A. Further- 
more, note that updating the matrix A consists of solving 
n independent optimization problems with respect to each 
column OLi. For each of them, one has to solve a weighted- 

optimization problem. Let us consider the update of a 
column OLi of A. 

We introduce the matrix r = diag[||di ||2, .., ||dp||2], 
and define = DF"^. If F is non-singular, we show 
in Appendix A that the relation ol'^ = TolI holds, where 

Oi'i = argmin i||xi - D'a-|||r + A||a-||i, and 

1 ^ 

< = argmin -||x, - Da^||^ + A ^ ||d, ||2|a,- ,|. 



This shows that the update of each column can easily be 
obtained with classical solvers for -decomposition prob- 
lems. We use to that effect the LARS algorithm [ ], imple- 
mented in the software accompanying [ ] . 

Since our optimization problem is invariant by multiply- 
ing D by a scalar and A by its inverse, we then proceed 
to the following renormalization to ensure numerical stabil- 
ity and prevent the entries of D and A from becoming too 
large: we rescale D and A with 

s= min lid. II 2, and define D ^ -D and A ^ sA. 

Since the image of is a vector space, D stays in the image 
of (p after the normalization. And as noted before, it does 
not change the value of the objective function. 

3.2. Step 2: Optimization of D with A Fixed. 

We use a projected gradient descent algorithm [ ] to up- 
date D. The objective function / minimized during this step 
can be written as: 

/(D)4i||X-DA||^ + Af]||d,||2||a^||i, (4) 

where A is fixed, and we recall that cx^ denotes its j-th row. 
The function / is differentiable, except when a column of D 
is equal to zero, which we assume without loss of generality 
not to be the case. Suppose indeed that a column dj of D is 
equal to zero. Then, without changing the value of the cost 
function of Eq. (3), one can set the corresponding row ol^ to 
zero as well, and it results in a function / defined in Eq. (4) 
that does not depend on dj anymore. We have, however, 
not observed such a situation in our experiments. 

The function / can therefore be considered as differen- 
tiable, and one can easily compute its gradient as: 

V/(D) = -(X - DA)A^ + DA, 

where A is defined as A 4 diag(Ag^, . . . , Ag^). 

To use a projected gradient descent, we now need a 
method for projecting D onto the convex set Im cp, and the 
update rule becomes: 

D^ni^^[D-pV/(D)], 

where Him ^ is the orthogonal projector onto Im (p, and p is 
a gradient step, chosen with a line-search rule, such as the 
Armijo rule [ ]. 

Interestingly, in the case of the single epitome (and in 
fact in any other extension where (p is 3. linear operator that 
extracts some patches from a parameter vector E), this pro- 
jector admits a closed form: let us consider the linear oper- 
ator (p* : R^x^^ ^ R^, such that for a matrix D in R^^^^, 
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a pixel of the epitome (f'^i'D) is the average of the entries 
of D corresponding to this pixel value. We give the formal 
form of this operator in Appendix B, and show the follow- 
ing results: 

(i) (/p* is indeed linear, 

(ii) ^im^ = (po(p*. 

With this closed form of Ilim cp in hand, we now have an ef- 
ficient algorithmic procedure for performing the projection. 
Our method is therefore quite generic, and can adapt to a 
wide variety of functions (p. Extending it when (p is not lin- 
ear, but still injective and with an efficient method to project 
on Im cp will be the topic of future work. 

4. Improvements 

We present in this section several improvements to our 
basic framework, which either improve the convergence 
speed of the algorithm, or generalize the formulation. 

4.1. Accelerated Gradient Method for Updating D. 

A first improvement is to accelerate the convergence 
of the update of D using an accelerated gradient tech- 
nique [ , ] . These methods, which build upon early works 
by Nesterov [ ], have attracted a lot of attention recently 
in machine learning and signal processing, especially be- 
cause of their fast convergence rate (which is proven to be 
optimal among first-order methods), and their ability to deal 
with large, possibly nonsmooth problems. 

Whereas the value of the objective function with classi- 
cal gradient descent algorithms for solving smooth convex 
problems is guaranteed to decrease with a convergence rate 
of 0(l//c), where k is the number of iterations, other al- 
gorithmic schemes have been proposed with a convergence 
rate of 0(l//c^) with the same cost per iteration as classi- 
cal gradient algorithms [2, 18, 19]. The difference between 
these methods and gradient descent algorithms is that two 
sequences of parameters are maintained during this iterative 
procedure, and that each update uses information from past 
iterations. This leads to theoretically better convergence 
rates, which are often also better in practice. 

We have chosen here for its simplicity the algorithm 
FISTA of Beck and Teboulle [ ], which includes a practi- 
cal line- search scheme for automatically tuning the gradient 
step. Interestingly, we have indeed observed that the algo- 
rithm FISTA was significantly faster to converge than the 
projected gradient descent algorithm. 

4.2. Multi-Scale Version 

To improve the results without increasing the computing 
time, we have also implemented a multi-scale approach that 
exploits the spatial nature of the epitome. Instead of directly 
learning an epitome of size M, we first learn an epitome of a 
smaller size on a reduced image with corresponding smaller 
patches, and after upscaling, we use the resulting epitome as 



the initialization for the next scale. We iterate this process 
in practice two to three times. The procedure is illustrated in 
Figure 2. Intuitively, learning smaller epitomes is an easier 
task than directly learning a large one, and such a procedure 
provides a good initialization for learning a large epitome. 



Multi- scale Epitome Learning. 



Input: n number of scales, r ratio between each scale, 
Eo random initialization for the first scale, 
for A: = 1 to n do 

Given Ik rescaling of image / for ratio ^7^^ , 

X/c the corresponding patches, 

initialize with E = upscale(Efe_i, r), 

E/c = epitome (X/^, E). 
end for 

Output: learned epitome E. 



Figure 2. Multi-scale epitome learning algorithm. 

4.3. Multi-Epitome Extension 

Another improvement is to consider not a single epitome 
but a family of epitomes in order to learn dictionaries with 
some shift invariance, which has been the focus of recent 
work [1^, ^^]. Note that different types of structured dic- 
tionaries have also been proposed with the same motivation 
for learning shift-invariant features in image classification 
tasks [ ], but in a significantly different framework (the 
structure in the dictionaries learned in [ ] comes from a 
different sparsity-inducing penalization). 

4 epitomes 




Figure 3. A "flat" dictionary (left) vs. a collection of 4 epitomes 
(right). The atoms are extracted from the epitomes and may over- 
lap. 

As mentioned before, we are able to learn a set of 
epitomes instead of a single one by changing the function 
(f introduced earlier. The vector E now contains the pixels 
(parameters) of several small epitomes, and cp is the linear 
operator that extracts all overlapping patches from all epit- 
omes. In the same way, the projector on Imcp is still easy 
to compute in closed form, and the rest of the algorithm 
stays unchanged. Other "epitomic" structures could easily 
be used within our framework, even though we have limited 
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ourselves for simplicity to the case of single and multiple 
epitomes of the same size and shape. 

The multi-epitome version of our approach can be seen 
as an interpolation between classical dictionary and single 
epitome. Indeed, defining a multitude of epitomes of the 
same size as the considered patches is equivalent to work- 
ing with a dictionary. Defining a large number a epito- 
mes slightly larger than the patches is equivalent to shift- 
invariant dictionaries. In Section 5, we experimentally com- 
pare these different regimes for the task of image denoising. 

4.4. Initialization 

Because of the nonconvexity of the optimization prob- 
lem, the question of the initialization is an important issue 
in epitome learning. We have already mentioned a multi- 
scale strategy to overcome this issue, but for the first scale, 
the problem remains. Whereas classical flat dictionaries can 
naturally be initialized with prespecified dictionaries such 
as overcomplete DCT basis (see [ ]), the epitome does not 
admit such a natural choice. In all the experiences (un- 
less written otherwise), we use as the initialization a single 
epitome (or a collection of epitomes), common to all ex- 
periments, which is learned using our algorithm, initialized 
with a Gaussian low-pass filtered random image, on a set of 
100 000 random patches extracted from 5 000 natural im- 
ages (all different from the test images used for denoising). 

5. Experimental Validation 
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Figure 4. House, Peppers, Cameraman, Lena, Boat and Barbara 
images. 



We provide in this section qualitative and quantitative 
validation. We first study the influence of the different 
model hyperparameters on the visual aspect of the epitome 
before moving to an image denoising task. We choose to 
represent the epitomes as images in order to visualize more 
easily the patches that will be extracted to form the images. 
Since epitomes contain negative values, they are arbitrarily 
rescaled between and 1 for display. 



In this section, we will work with several images, which 
are shown in Figure 4. 

5.1. Influence of the Initialization 

In order to measure the influence of the initialization on 
the resulting epitome, we have run the same experience with 
different initializations. Figure 5 shows the different results 
obtained. 

The difference in contrast may be due to the scaling of 
the data in the displaying process. This experiment illus- 
trates that different initializations lead to visually different 
epitomes. Whereas this property might not be desirable, the 
classical dictionary learning framework also suffers from 
this issue, but yet has led to successful applications in im- 
age processing [ ] . 




Figure 5. Three epitomes obtained on the boat image for different 
initializations, but all the same parameters. Left: epitome obtained 
with initialization on a epitome learned on random patches from 
natural images. Middle and Right: epitomes obtained for two dif- 
ferent random initializations. 



5.2. Influence of the Size of the Patches 

The size of the patches seem to play an important role in 
the visual aspect of the epitome. We illustrate in Figure 6 
an experiment where pairs of epitome of size 46 x 46 are 
learned with different sizes of patches. 




Figure 6. Pairs of epitomes of width 46 obtained for patches of 
width 6,8,9,10 and 12. All other parameters are unchanged. Ex- 
periments run with 2 scales (20 iterations for the first scale, 5 for 
the second) on the house image. 



As we see, learning epitomes with small patches seems 
to introduce finer details and structures in the epitome, 
whereas large patches induce epitomes with coarser struc- 
tures. 
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Figure 7. 1, 2, 4 and 20 epitomes learned on the barbara image 
for the same parameters. They are of sizes 42, 32, 25 and 15 in 
order to keep the same number of elements in D. They are not 
represented to scale. 

5.3. Influence of the Number of Epitomes 

We present in this section an experiment where the num- 
ber of learned epitomes vary, while keeping the same num- 
bers of columns in D. The 1, 2, 4 and 20 epitomes learned 
on the image barbara are shown in Figure 7. When the num- 
ber of epitomes is small, we observe in the epitomes some 
discontinuities between texture areas with different visual 
characteristics, which is not the case when learning several 
independant epitomes. 

5.4. Application to Denoising 

In order to evaluate the performance of epitome learn- 
ing in various regimes (single epitome, multiple epitomes), 
we use the same methodology as [ ] that uses the success- 
ful denoising method first introduced by [ ]. Let us con- 
sider first the classical problem of restoring a noisy image y 



in which has been corrupted by a white Gaussian noise 
of standard deviation a. We denote by in the patch 
of y centered at pixel i (with any arbitrary ordering of the 
image pixels). 

The method of [ ] proceeds as follows: 

• Learn a dictionary D adapted to all overlapping 
patches i/^, ?/27 • • • from the noisy image y. 

• Approximate each noisy patch using the learned dic- 
tionary with a greedy algorithm called orthogonal 
matching pursuit (OMP) [ ] to have a clean estimate 
of every patch of y^ by addressing the following prob- 
lem 

argmin||ai||o s.t. \\y^ - ^cXiWl ^ (Ca^), 

where Da^ is a clean estimate of the patch y^, ||<^i||o 
is the £o pseudo-norm of a^, and C is a regularization 
parameter. Following [ ], we choose C = 1.15. 

• Since every pixel in y admits many clean estimates 
(one estimate for every patch the pixel belongs to), av- 
erage the estimates. 




Figure 8. Artificially noised boat image (with standard deviation 
cr = 15), and the result of our denoising algorithm. 



Quantitative results for single epitome, and multi-scale 
multi-epitomes are presented in Table 1 on six images and 
five levels of noise. We evaluate the performance of the de- 
noising process by computing the peak signal-to-noise ratio 
(PSNR) for each pair of images. For each level of noise, 
we have selected the best regularization parameter A overall 
the six images, and have then used it all the experiments. 
The PNSR values are averaged over 5 experiments with 5 
different noise realizations. The mean standard deviation is 
of 0.0 5dB both for the single epitome and the multi- scale 
multi-epitomes. 

We see from this experiment that the formulation we pro- 
pose is competitive compared to the one of [ ]. Learning 
multi epitomes instead of a single one seems to provide bet- 
ter results, which might be explained by the lack of flexi- 
bility of the single epitome representation. Evidently, these 
results are not as good as recent state-of-the-art denoising 
algorithms such as [ , ] which exploit more sophisticated 
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boat 




35.98 
35.86 


34.52 

34.41 


33.90 

33.83 


34.41 

34.01 


35.51 

35.43 


33.70 

33.63 




34.45 

34.32 


32.50 

32.36 


31.65 

31.59 


32.23 

31.84 


33.74 

33.66 


31.81 

31.75 




33.18 

33.08 


31.00 

30.93 


30.19 

30.11 


30.69 

30.33 


32.42 

32.35 


30.45 

30.37 




32.02 

31.96 


29.82 

29.77 


29.08 

29.01 


29.49 

29.14 


31.36 

31.29 


29.36 

29.30 




27.83 
27.83 


26.06 
26.07 


25.57 
25.60 


25.04 

24.86 


27.90 

27.82 


26.01 
26.02 



Table 1. PSNR Results. First Row: 20 epitomes of size 7x7 
learned with 3 scales (IE: improved epitome); Second row: single 
epitome of size 42 x 42 (E). Best results are in bold. 
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10 


34.83 


34.67 


34.71 


28.83 


34.76 


35.24 


35.32 


15 


32.95 


32.79 


32.84 


28.92 


32.87 


33.43 


33.50 


20 


31.55 


31.41 


31.36 


28.55 


31.52 


32.15 


32.18 


25 


30.41 


30.29 


29.99 


28.12 


30.42 


31.15 


31.11 


50 


26.57 


26.52 


25.91 


25.21 


26.66 


27.69 


27.87 


mean 


31.26 


31.14 


30.96 


27.93 


31.25 


31.93 


32.00 



Table 2. Quantitative comparative evaluation. PSNR values are av- 
eraged over 5 images. We compare ourselves to two previous epit- 
ome learning based algorithms: ISD ([ ]) and epitomes by Jojic, 
Frey and Kannan ([ ] as reported in [ ]), and to three more elab- 
orate dictionary learning based algorithms K-SVD ([ ]), BM3D 
(HXandLSSCa ]). 

image models. But our goal is to illustrate the performance 
of epitome learning on an image reconstruction task, in or- 
der to better understand these formulations. 

6. Conclusion 

We have introduced in this paper a new formulation and 
an efficient algorithm for learning epitomes in the context of 
sparse coding, extending the work of Aharon and Elad [ ], 
and unifying it with recent work on shift-invariant dictio- 
nary learning. Our approach is generic, can interpolate be- 
tween these two regimes, and can possibly be applied to 
other formulations. Future work will extend our framework 
to the video setting, to other image processing tasks such as 
inpainting, and to learning image features for classification 
or recognition tasks, where shift invariance has proven to 
be a key property to achieving good results [12]. Another 
direction we are pursuing is to find a way to encode other 
invariant properties through different mapping functions (p. 
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A. Appendix: ^i-Norm and Weighted ^i-Norm 

In this appendix, we will show the equivalence between 
the two minimization problems introduced in section 3.1. 
Let us denote 

F(D,a) = i||x-Da||^ + ^f]||d,||2|a,|, (5) 



and G(D,a) = i||x-Da||2 + A||a||i. (6) 

Let us define a' G W and G M^><^ such that = 
Dr-\and a' = Ea, where E = diag[||di ||2, .., Hd^lb]. 
The goal is to show that cx^^ = Ea"^, where: 

cx* = argminF(D, a), and cx'^ = argmin ^(D', a'). 

We clearly have: Da = D'a^ Furthermore, since 
Tex = cx\ we have: Vj = 1, . . . ||d_^- ||2|aj | = 
Therefore, 

F(D,a) = G(D^aO. (7) 

Moreover, since for all D, is in the set V, we have 
shown the equivalence between Eq. (1) and Eq. (2). 

B. Appendix: Projection on Im (f 

In this appendix, we will show how to compute the or- 
thogonal projection on the vector space Im (p. Let us denote 
by Ri the binary matrix in{0,l}^^^ that extracts the i-th 
patch from E. Note that with this notation, the matrix is 
a binary M x m matrix corresponding to a linear operator 
that takes a patch of size m and place it at the location i in 
an epitome of size M which is zero everywhere else. We 
therefore have (p^E) = [RiE, . . . , RpE]. 

We denote by (p* : R^^^ ^ the linear operator 
defined as 

^*(D) = (f]RjR,)-^(ERrD), 

which creates an epitome of size M such that each pixel 
contains the average of the corresponding entries in D. In- 
deed, the M X M matrix {Yl^j=i RjRj)"^ is diagonal and 
the entry i on the diagonal is the number of entries in D 
corresponding to the pixel i in the epitome. 
Denoting by 

r ^1 1 



which is a mp x M matrix, we have vec((/9(E)) = RE, 
where vec(D) = [df , . . . , dj]^, which is the vector of size 
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mp obtained by concatenating the columns of D, and also 
(/p*(D) = (R^R)-iR^vec(D). 

Since Yec{lm(p) = ImR and vec((/?((/?*(D))) = 
R(R^R)~^R^vec(D), which is an orthogonal projection 
onto Im R, it results the two following properties which are 
useful in our framework and classical in signal processing 
with overcomplete representations ([ 6]): 

• (/:?* is the inverse function of (p on Im cp: (f* o cp = Id. 

• {(f o (f*) is the orthogonal projector on Im (p. 
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